After the Human Genome Project recently announced the first truly complete sequencing of a human genome, today scientists have reported a new important milestone in biology: the first comprehensive functional map of every gene in the human genome using a novel single-cell sequencing tool. This new map that links each gene to its job in the cell has been made a free resource to other scientists.
Finding a gene’s function typically involves tremendous effort and can be likened to searching for a needle in a haystack. Armed with the new library, biologists can now screen this database to find new genotype-phenotype relationships without having to do any experiments. Drug discovery could be significantly sped up and new gene-based therapies could be developed at an unprecedented rate, thanks to these developments.
Gene perturbations that scale up discovery
To draw the map, the researchers spent years developing a single-cell sequencing method called Perturb-seq. First unveiled in 2016 by scientists at the Whitehead Institute and MIT, the method can easily determine what happens when a gene is turned on or off, revealing its function.
Perturb-seq first uses the CRISPR-Cas9 genome editing tool, sometimes called the “molecular scissor” due to its ability to cut and paste genetic sequences, to cause genetic changes in cells. Then, the method uses single-cell RNA sequencing to read the RNA expressed as a result of the genetic “perturbation” introduced earlier. RNAs instruct cells how to behave, such as what protein to produce and when, so Perturb-seq could someday decode most or perhaps even all cellular effect owed to genetic changes in human cells.
Initially, Perturb-seq was tested on a small set of genes at a high cost. But since then, researchers led by Jonathan Weissman at the Whitehead Institute and colleagues, including researchers from Princeton University and 10x Genomics, have devised a new and improved version of Perturb-seq that can be scaled up across the entire human genome.
For instance, an earlier version of Perturb-seq was used to investigate how human and viral genes interact over the course of an infection with HCMV, a common herpes virus.
In the new study, Joseph Replogle, an MD-PhD student in Weissman’s lab and co-first author of the present paper, led the effort to scale up the method for the entire genome. By combining CRISPR and single cell sequencing, Perturb-seq evaluated more than 2.5 million cells from human blood cancer cell lines and noncancerous cells derived from the retina, building a comprehensive map linking genotypes to phenotypes.
“It’s a big resource in the way the human genome is a big resource, in that you can go in and do discovery-based research,” said Weissman, who is also a professor of biology at the Massachusetts Institute of Technology (MIT) and investigator with the Howard Hughes Medical Institute. “Rather than defining ahead of time what biology you’re going to be looking at, you have this map of the genotype-phenotype relationships and you can go in and screen the database without having to do any experiments.”
Probing genes without an experiment?
One way to understand the function of a gene is to use molecular biology techniques to knock it out and observe the effects. Another approach is to computationally identify the function of a gene by looking at other human genes with a similar sequence or those with shared ancestry for comparison. But sometimes there is no human gene available for comparison, and scientists have to compare human genes to those of other species.
The main advantage of Perturb-seq is that you can use the dataset in an unbiased way to look at genes with unknown functions without having to go through the grueling process of knocking out genes one at a time.
For instance, the researchers could compare unknown genes to known ones that have similar transcriptional outcomes, suggesting they are part of a larger complex. Using this method, the researchers singled out a gene called C7orf26 that stood out because a mutation produced similar phenotypes to the removal of certain genes part of a protein complex called Integrator. Previous research showed that the Integrator complex is composed of many smaller subunits and the researchers have now confirmed that C7orf26 is part of them as the 15th component. All these 15 subunits work in unison to perform very specific functions within the Integrator complex.
“Absent this thousand-foot-high view of the situation, it was not so clear that these different modules were so functionally distinct,” Reuben Saunders, a graduate student in Weissman’s lab and co-first author of the paper, said in a statement.
Another strong point is that Perturb-seq’s data, which is gathered on the single-cell level, can be used to assess more complex phenotypes that can become muddied when they’re studied from other cells.
“We often take all the cells where ‘gene X’ is knocked down and average them together to look at how they changed,” Weissman said. “But sometimes when you knock down a gene, different cells that are losing that same gene behave differently, and that behavior may be missed by the average.”
In one application of their dataset, the researchers looked at how mitochondria responded to stress. A mitochondrion is a cell organelle that has an extremely important role in the proper functioning of the cell. Also known as the ‘powerhouse of the cell’, mitochondria are responsible for producing chemical energy called ATP (adenosine triphosphate), which is necessary for all biological processes within the body to occur. One unique feature of mitochondria is that they contain their own DNA – mitochondrial DNA (mtDNA), whereas all the other DNA of a cell is found in the nucleus (nDNA).
There are 13 genes in the mitochondria genome, but inside the nDNA there are more than 1,000 genes linked to mitochondrial function. When various mtDNA genes were perturbed, the nuclear genome responded with similar changes.
“People have been interested for a long time in how nuclear and mitochondrial DNA are coordinated and regulated in different cellular conditions, especially when a cell is stressed,” Replogle said in a statement.
“There’s still an open question of why mitochondria still have their own DNA,” he added. “A big-picture takeaway from our work is that one benefit of having a separate mitochondrial genome might be having localized or very specific genetic regulation in response to different stressors.
Perturb-seq has proven itself a very powerful tool in the field of genomics — and this is just the tip of the iceberg. The research team plans on enhancing their map of gene functions by exploring other types of cells besides the cancer cell line they worked with for this study.
“This really is the culmination of many years of work by the authors and other collaborators, and I’m really pleased to see it continue to succeed and expand,” said Tom Norman, a co-senior author of the paper, who now leads a lab at Memorial Sloan Kettering Cancer Center.
The findings appeared in the journal Cell.